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Abstract 



In this paper we consider a system of quadratic equations | (zj ,x)\ 2 = bj, j = 1, m, where 
x G M. n is unknown while normal random vectors Zj G R™ and quadratic measurements bj G M 
are known. The system is assumed to be underdetermined, i.e., m < n. We prove that if there 
exists a sparse solution x i.e., at most k components of x are non-zero, then by solving a convex 
optimization program, we can solve for a; up to a multiplicative constant with high probability, 



provided that k < O(y^). On the other hand, we prove that k < 0{\ognyfm) is necessary 

for a class of naive convex relaxations to be exact. 

Keywords, ^-minimization, Trace minimization, Shor's SDP-relaxation, Compressed Sensing, 
PhaseLift, KKT Condition, Approximate Dual Certificate, Golfing Scheme, Random Matrices with 
IID Rows. 

1 Introduction 

1.1 Introduction and the main results 

Convex optimization methods have recently been proven to be very successful in solving some 
classes of linear or quadratic algebraic equations. One classical example is compressed sensing 
( where a system of underdetermined linear equations can be solved exactly by using an 

^i-convex relaxation, provided that the unknown vector is sparse. A typical result is as follows: 

Compressed Sensing Suppose A £ ]R mxri has IID jV(0, 1) entries and xq G W 1 satisfies ||£Co||o = 
k (only k components of x are not zeros). If we have linear measurements b = Axq, then we can 
recover x exactly with high probability by solving 




subject to 



minimize 



\\ x \\i 
b = Ax 
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provided k < 0(m/ log (n/m)). 

Another example is a recently proposed semidefinite programming framework for phase retrieval, 
called PhaseLift [5j, by which a signal can be exactly recovered- up to a multiplicative constant- 
from quadratic measurements. The SDP is a combination of trace minimization and Shor's SDP- 
relaxation for quadratic constraints. We review the results in [U[5] below: 

PhaseLift Fix a signal x G W 1 . Let Zi G W 1 be IID standard normal random vectors, and 
suppose bj, j = 1, ...,m are defined as follows: 

bj = \{zj,x)\ 2 , j = l,...,m, (1.2) 

If we assume m > Con for some numerical constant Co, then with high probability, xx T is the 
unique solution to the following convex optimization problem: 

minimize Tr(X) 

subject to zJXzj = bj, j = l,..,m, (1-3) 

x y o. 

Notice that xx T is feasible since xx T y and 

zj(xx T )zj = \ (zj,x)\ 2 = bj, j = l,...,m. 

There is an inherent ambiguity to the solution of (jl.2p . since multiplying by a phase factor (±1 
in the real case) does not change measurements. From now on, we only consider solutions modulo 
multiplication by phase. 

In this paper , we consider model (II. 2p in the case that m « n. In this regime, (11. 2D does not 
yield injective measurements. In fact, each equation in (jl.2p is the union of two linear equations 
by assigning different signs, so generally we have 2 m solutions. However, if we assume that the 
unknown vector x is fc-sparse, then under some mild conditions on the number of measurements, 
system (|1.2p becomes well-posed: 

Theorem 1.1 Let x G M n be a k-sparse real signal, dj G M. n , i = 1 . . . , mi be generic real measure- 
ment vectors and let y G C n be a k-sparse complex signal and bi G C™, i = 1 . . . mi be generic com- 
plex measurement vectors. Then mi > 4k — 1, mi > 8k — 2 quadratic measurements {(a^, x} 2 } 7 ^}, , 
{| (bi,y) \ 2 } r ^i are sufficient to recover x and y modulo phase. 

By generic we mean an open dense subset of the set of all m-element frames in R™ or C". 

Proof We only prove the complex case, since the real case is similar. Assume that there is a 
fc-sparse y' G C™ such that \{bi, y')\ 2 = \ {bi, y)\ 2 , i = 1, . . . mi > 8k — 2. Let T be the union of the 
supports of y and y' . Clearly \T\ < 2k. Then 

\(bi,y)\ = \(bi,y ) , i = l,...,m 2 , 
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which is equivalent to 

1 2 i 1 2 

\(bj T ,y T )\ =\{b jT ,y' T )\ ,i = l,...,m 2 , 

where vt means the restriction of v on the support T. The genericity of bi, i = 1, ...,rri2 implies 
the genericity of bi^, i = 1, ...,m. Then since 777-2 > 4(2/c) — 2 = 8A; — 2 we have ?/t = e l ^y' T for 
some real number ?/> by Theorem 3.1 in pQ. Therefore ?/ = e l ^y' . ■ 

Injectivity of the measurements of course doesn't imply that efficient recovery is possible. Yet, 
inspired by the success of convex relaxations in compressed sensing and phase retrieval, it is natural 
to leverage the sparsity assumption to try to efficiently recover signals from fewer than n intensity 
measurements. A convex formulation in this direction, which, to the best of our knowledge, was 
first proposed in [8] to solve (|1.2p . is the following program: 

minimize ||X||i + A Tr(X) 

subject to zJXzj = bj, j = l,..,m, (1-4) 

x y o. 

The next theorem shows that when Zj are IID standard normal random vectors, the solution to 
(jl.4p for an appropriate choice of A, is exactly xx T , provided that k < 0(„ 



Theorem 1.2 Fix a signal x £ M. n with \\x\\2 = 1 and \\x\\q = k, i.e, only k components of x 
are non-zero. Let z\ £ W 1 be IID standard normal random vectors, and suppose bj, j = l,...,m 
are defined as in (jl.2p . Then the solution to the convex program (|1.4p is exact with probability at 
least 1 - (21ogn + 3)(4e" 721 °s(™)+3 + J^) _ (5 + 2n 2 ) e -^ m , provided A > \/fe||a;||i + 1, X < ^ and 
m > CqX 2 logra. Here Cq and 7 are numerical constants. 



Remark 1: By choosing A = y nCo\og n ' we nave exact recovery with probability at least 1 — (21ogn- 

3)(4 e 7 2io g (n)+3 _|_ J^.) _ (5 _j_ 2n 2 )e _7 ' m if the number of measurements obeys m > 0(||a;||f k log n). 
Moreover, 1 
0(k 2 logn). 



Moreover, by choosing a; to be a k-sparse vector with components x% = this reads m > 



Remark 2: In [8j, the authors operate under an assumption that the sampling operator satis- 
fies a generalization of the Restricted Isometric Property and mutual coherence, while in Theorem 
11.21 of our paper we assume the z^s are IID standard Normal vectors. In our setting the mutual 
coherence of the sampling operator defined in [8] will be on the order of O(l), since the diagonal 
entries of ZjZj are always x 2 random variables. Applying the result in [8] we get k = 0(1) in our 
setting, which is a much smaller range of sparsity than considered in the result of the above theorem. 

The conclusion of Theorem ll.2l is far more restrictive than that of Theorem ll.il so one may ponder 
whether 11.21 is optimal. The following result shows that indeed there is a substantial gap between 
solving (TO]) and (TO]) . 

Theorem 1.3 Under the setting of Theorem assuming 4 < k < m < 40 ^ g n , then there is 
an event E with probability at least 1 — — me -0- 09n + - 09fc + - 79m ; such that the following property 
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holds: If there exists oA6l such that xx T is a minimizer of (jl.4p , then we have 

k _ 2 max(||a;||j-A;/2,0) 2 \ 
4 500 log 2 n J ' 

Remark: Taking x to be a k-sparse vector with components X{ = =t;^p this reads m > 0(k 2 / log 2 n). 

This theorem obtains sharp theoretical results on the performance of (II. 4p in the Gaussian quadratic 
measurement setting, which may be surprising since it implies that there is a substantial gap be- 
tween the sufficient number of measurements for injectivity and the necessary number of measure- 
ments for recovery via a class of natural convex relaxations. 



1.2 Definitions and notations 

In this section we introduce some useful definitions and notations, which will be used in the proofs 
of Theorems 11.21 and 11.31 In this paper vectors and matrices are boldfaced while scalars are not. 

For any positive integer no, denote [no] = {1, . . . , no}. Let G = {i € [n] : x-i ^ 0} be the sup- 
port of x and B be the complement G = {i 6 [n] : X{ = 0}. Without loss of generality, we assume 
G = {1, k}. Define the subspaces of symmetric matrices {Xij = 0,i > k or j > k, X = X T }, 
r = {X\X i:j = 0,i < k or j < k,X = X T } and T = {xx% + x x T ,x £ R n }. In the space 
of symmetric matrices, we define the inner product (X,Y) = Tr(XY). Then for any subspace 
of symmetric matrices R, we denote by R 1 - its orthogonal complement under such an inner product. 

For the given random vectors Zj, j = 1, ...,m, let A : M nxri — > R m be the linear operator A(X) = 
{Tr(zizf X)} ie .[ m ] for any symmetric matrix (X). Hence its adjoint is A*{y) = X^e[m] Vi z i z I ■ 

For a symmetric matrix X, we put Xt for the orthogonal projection of X onto T and similar 
to X T ±, Xq, X n ±, X r ±, Xfi n T and so on. For a vector v £ W l , we define vq = (v, ei)ei + ... + 
(v,efc)efc and vb = v — vq. Here (ei, ...,e n ) is the standard basis of M. n . 

Denote \\y\\ p as the £ p norm of a vector y, where p could be 0, 1 or 2. Let ||-X"|| and ||-X"||f 
be the spectral and Frobenius norms of a matrix X, respectively. Moreover, let ||-X"||oo and ||^||i 
be the maximum and the summation of absolute values of all entries of X respectively, i.e., they 
represent the l^, and l\ norms of the vectorizations of matrices. 

2 The proof of Theorem [T2] 

In this section we will prove Theorem 11.21 First we will cite and prove some supporting lemmas. 
Then we prove that it suffices to construct an approximate dual certificate matrix to the primal 
convex optimization problem. Finally we use a modification of the golfing scheme to construct 
such an approximate dual certificate with high probability. Both the idea of the approximate dual 
certificate and the golfing scheme are originally due to David Gross' work [7] in Matrix completion. 



m > min 
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2.1 Preliminaries 

In this section we establish some useful properties of A. 

Lemma 2.1 ( [5]) There is an event E of probability at least 1 — 5e _7om such that on E, any 
positive symmetric matrix obeys 

(I - l/8)Tr(X n ) < 7n~ 1 \\A(X ii )\\ 1 < (l + l/8)Tr(Xn), (2.1) 

and any symmetric rank-2 matrix obeys 

rrT 1 ||.4(X n )||i > 0.94(1 - l/8)||X n ||. (2.2) 

Lemma 2.2 There is an event E of probability at least 1 — 2n 2 e _7 ° m such that on E, any symmetric 
matrix obeys 

m-VPOIIi^ffli- (2-3) 
Proof By direct calculation, we have 

. - m - 771 .. m 

— ||-4(X)||i = — V |(X,Zj-zJ)| < — V V |^ a &%a%&| < max— (V |z ja z j6 |)||X||i. 
i=l j=l a,6 j=i 

Since \zj a Zjb\, j = l...,m are IID sub-exponential variables with expectation 1 or - and have finite 
-^i-norm. By Proposition 5.16 of [9], we have 



^ 777, 

max— (V" \z ja z jb \) < 9/8 

a, b m — * 

with probability at least 1 — 2n 2 e _7om . On this event we have m _1 ||„4,(_X^)||i < |||J£||i. ■ 

2.2 Exact recovery by the existence of an approximate dual certificate. 

In the classical theory of semidefinite programming, the existence of an exact dual certificate can 
be used to prove that a specific point is the solution to the primal problem. By using an idea in [7], 
in order to prove Theorem 11.21 it suffices to prove the existence of an approximate dual certificate. 

Lemma 2.3 Denote Xq = \xx T + VT(sgn(x) sgn(x) T ). Suppose there exists Y = v\Z\z\ + ... + 
v m ZmZm for some real numbers vi,...,v m satisfying \\Y Tn n - X \\ F < \\Y T x nn \\ < " X °" F 

and \\Yq±\\ 00 < c ^]^ n \\Xo\\F, with some numerical constant C. Then assuming that A satisfies 



properties (|2.ip , (|2.2p and ()2.3p . we have that xx is the unique solution to the convex program 
provided that A > \/&||a;||i + 1, A < ^ and m > 64C 2 A 2 logn. 
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Proof Let X be the solution to the convex program dHU) and let H = X - xx T . Then by the 
feasibility condition of the convex program (II, 4p , we have 

A(H) = 0, (2.4) 

and 

xx T + HhO. (2.5) 

By inequality (|2.5|) . we have 

H T ± nn h0, H B y_0 and H T ± y 0. (2.6) 
By equality (pUj) . we have A(H Tnn ) = A(H T ± un ±). Then by ([EE]), fl2J|, ([23]) and d2H]), we have 

l|ffTnnl1 - o.94xV/8)^ P(jffTnn)l11 

< ^||i(i/ TW )|i 

<H ( ||^ (ffTXnn )|| 1 + ||A(flbL)||l) 

< 1.3 x (9/8) (Tr(fl- T x nn ) + ||fr„x||i). 

Since icank(Hxnn) — 2, we have 

\\H Tnn \\ F < v^HHrnnll < 2.5 (Tr(iT T x nn ) + H^qxIIi) . (2.7) 

Now let's see what inequalities about H we can get from the objective function. Since both X and 
xx T are feasible and X is the minimizer, we have 

||X||i + ATr(X) < ||a;a; r ||i + XTt{xx t ). 

Also, since 

||X||i + ATr(X) = \\xx T + HWx + \Ty(xx t + H) 

> ||a;a; T ||i + (sgn(x) sgn(x) T , H) + ||H n ±||i + XTi{xx T ) + XTr(H), 

we have 

(sgn(;r)sgn(a;) T , //) + \\H n x\\i + XTr(H) < 0. 

This implies 

(P T (sgn(a;)sgn(a;) T ) + Xxx T ,H T ) + (V T ± (sgn(aj) sgn(a;) T ), H T ±) + ||if n x||i + XTr(H T± ) < 0. 
It is easy to see that V^±(sgn(x) sgn(x) T ) is positive semidefinite and combining with (|2.6p . we get 

{V T ±(Bgp.(x)agx(x) T ),H T x) > 0, 

which implies 

(X ,H T nn) + \\H Q ±\\i + XTy(H t ±) < 0. 



Notice that Tr(H T ±) = Tr(H T ± nn ) + Tr(H B ). By $2J^ and A > 0, we have 

{X ,H Tnn ) + ||JEf nX ||i +XTv(H T x na ) < 0. (2.8) 

By the construction of the approximate dual certificate Y, we know Y = A*(v), which implies 
(H,Y) = (A(H),v) = 0. Then we have 

(Hxnn, Y Tnn - X ) + (H Tn n, X ) + (H T ± nn , Y T ± nU ) + (H n ±,Y n ±) = 0. 

By the assumed properties of Y, we have 



\ X o\\F I, „ n . ,„ > \\X \\ F Cy^Ogn 

a o "TnO F + \-tiTnn, X ) H lr(ii T x nn J H — F % l > U. 

on z 5 i/ffl 



By dZB]), we have 



^\\H TnU \\ F > (A - H^)Tr(ff T x nn ) + (1 - ^PllXoym^. (2.9) 



6n 2 5 ^fm 

Since 

7 3 T(sgn(a;) sgn(tc) T ) = ||a;||i(a;sgn(a;) T + sgn(x)a; T ) — ||a;||^xx T , 

we have 

\Xq\ f = ||As£C T + 7 : 'r(sgn(a;)sgn(ic) T )|| j p < A + + 2Vfc||cc||i. 
Then together with the assumptions of A > 1 1 £C 1 1 1 + 1, A < ^ and m > 64C 2 A 2 logn, we have 

||-X"o||f ||X ||fx , II^o||f . Q n C^/Iogn 

< 3(A ) and < 3(1 ^^\\X \\ F ), 



6n 2 5 6n 2 y/m 

by direct calculation. Therefore, by f)2.9|) 

\\H TnQ \\ F > 3 (Tr(i/ T x nQ ) + || H u ± ||i) . (2.10) 

Equations (|2.7p and (|2.10j) give i?rnn = 0, and then by (|2.10p . we have H T ± nn = and H n ± = 0. 
Hence H = 0, which implies jcjc t is the unique minimizer of the convex program (|1.4[) . ■ 

2.3 Key lemma 

The following lemma will be essential for the construction of a desirable dual certificate: 

Lemma 2.4 For any fixed X G T n 0, u>e /iave rank(X) < 2. Consider an eigenvalue decom- 
position X = + A2ii2^2) where \\u\\\ = \\u^\\ = 1, u^U2 = and both u\ and 112 are 
supported on G. Define 

Y = f(Xi,\2,u 1 ,u 2 ) 

. m 

:= ?Tl (/3 4 - /3 2 ) ^( Al (l Z JG W l| 2l {|^^i|<3} - AO + A 2(|2iG M 2| 2 l { |^.T M2 |< 3} - P 2 ))Zjzj. 
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Here we define f$2 = E z 2 l{| z |< 3 } « 0.9707, /?4 = Ez 4 l|| z |< 3 } « 2.6728, where assuming z a standard 
normal variable. Then with probability at least 1 — 4e _7m — 1/ra 3 , 



1 

To 1 



CoVlogn 
<~ . \\JL \\p. 



m 



provided m > C\k. Here 7, Co and C\ are numerical constants. 

Before proving Lemma 12.41 we need to prove the following supporting lemma: 

Lemma 2.5 Suppose Zj £ R n , j = l,...,m are IID jV(0, I n xn) random vectors, and u is any fixed 
vector with unit 2-norm, i.e, ||w||2 = 1- Then for any fixed e > 0, there exists a constant 7(e) and 
Co(e) satisfying 



^ m 

— ^Z(\ Z 3 Tu \ 2l {\ Zj T u \<Z}) Z 3 Z 3 T ~ ((& ~ h)uu T + P 2 I) 



3=1 



< e 



with probability at least 1 — 2e " (m provided m > Cqu. 

Proof By rotational invariance, we can assume u = ei. Define a matrix D = diagf— h=, -4=, .... —73=) 

Define Wj = D\zj\lf\ x ■Jol-I-Zj- It is immediate to check that the wj's are IID copies of a zero- 
mean, isotropic and sub-Gaussian random vector w. Standard results about random matrices with 
sub-gaussian rows — e.g. Theorem 5.39 in [9] — give 



n m 



w jWj 



3=1 



<e/3, 



which implies 



1 m \ 
- ^(M% M <3})^" T - ((& - ^)eiel + ftl) 

3=1 J 

/ m 



<\\D^\\{e/?,)\\D- l \\<e. 



with probability at least 1 — 2e ~i( e ) m provided that m > Co(e)n, where Co is sufficiently large. 
Proof of Lemma 12.41 It suffices to prove 



\Y Q - X\\ < — \\X\\ F , 
11 - 2 q 11 n-f > 



r n x lloo 



since 



and 



CoVlogn .. 
= 1 -A jr. 



1, 



??? 



|lrnn - X|| F < V2\\Y Tn n - X\\ < 2V2\\Y Q - X\\ < -\\X\\ F , 



S 



p 

1. \\Yq — X\\ < jf Ip. By Lemma 12.51 we have 



j m 

\\—^^ Z 3 T G U ^ 2l {\z j T G u a \<Z}) z 3G Z 3 T G ~ ((& ~ /3 2 )«a«I + 02-011 < e, a = 1, 2. 

with probability at least 1 — 2e _7m provided m > C\n. Similarly, since ^ Sj=i z 5g z 3G * s Wishart 
when restricted on fi, standard results in random matrix theory — e.g. Corollary 5.35 in [9] — assert 
that 

II ~ Z 3G Z 3G ~ 1 II - 6 

i=i 

with probability at least 1 — 2e~ 7m provided m > Cin. Then Denote 

- m 

Wa = m(/3 4 - 02 ) S(l z iS , *«| 2l {|**S«a|<3} - fo) Z 3G Z 3G ~ u " u l> a =1,2. 

We have with probability at least 1 — 4e 7m , || W a || < provided m > C\k. This actually gives us 
the conclusion by noticing that 

Y n - X = X l W 1 + \ 2 W 2 . 



2. \\Y^\\„ < *$&\\X\\ F . 

For any fixed a, b G [n], a > or 6 > fc, we know Y a (, = e^Ye^ is the arithmetic mean of m IID 
centered sub-exponential random variables, whose norm is bounded by -fT(|Ai| + IA2I) with a 
numerical constant K. Then by Proposition 5.16 in [9], we have 



Cpydogn 
"06 00 S 7= -A J?, 



with probability at least 1 — 1/n 5 , which implies our claim. 



2.4 Adaptation of the golfing scheme 

In this section we will construct the dual certificate satisfying all the properties in Lemma 12.31 by 
using the golfing scheme. 

Proof of Theorem 11.21 It suffices to construct Y satisfying all the properties in Lemma 12.31 with 
high probability. We divide the group of IID random vectors {z±, ...,z m } into I := [21og(n)J + 3 
groups 

r,(l) (lh r z (0 .(Oi 

This implies that mi + ... + m/ = m. We use the same definition of Xq in Lemma (|2.3p . For i=l,..,l, 
as in Lemma 12.41 we define the eigenvalue decomposition 

Xi-! = Ai i _ 1 ui._ 1 Wi iS _ 1 + A 2i _ 1 u 2i _ 1 'U2 iS _ 1 . 
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and 

Yi = f (Xi i _ 1 ,X 2i _ 1 ,u li _ 1 ,u 2i _ 1 ) . 

Moreover, we define Xj = Xi_\ — VrnniYi), and Y = Yli=i Yi- By definition we have Xj's are in 
Tn f2, so Yi is well-defined. By Lemma (|2.4|) . with probability at least 1 — Z(4e _7mi + 1/n 3 ), we 
have for i = 1, Z 



n ^ 1 nv n iiv n ^ 1 iitt n ^ ii v n ^ CoVlogn ,, 

■Aj F b r j?, ■*iT- L nn — T7T H*-* \\F, ana r ir2 x bo S 7= F) 

5 ID i/m 



provided mi > Ci/c, ...,m/ > Cife. Therefore, Y = v\Z\z\ + ... + VmZmZ^ and 

II^Tnn - X \\ F = \\Xi\\ F < (^) l \\X \\ F < ^J^, (by Z > 21ogn + 2) 

ll^nnll < II^TJ-nnll ^ YI " ttM^ ^ " ij^r) 1 '" 1 ' ^ i^Lfl. 



and 



10 -f-' 10 5' 

i=l i=l i=l 



ir n x Hoc < ^ ii < ;l — ^ — ^ i ^ n x °n^ 



1 = 1 2=1 



When m > (21og?i + 3)C\k, we can always make such a division of {zi, ...,z m }, so the proof is 
complete. ■ 



3 The proof of Theorem [173] 

We first prove a useful lemma: 

Lemma 3.1 Suppose aj, j = 1, mi and fy, j = 1, rri2 are IID A/"(0, Inxn) random vectors 
in R^ ; where mi > 0, m2 > ancZ mi + m,2 < 2V. T/ien Z/iere is an event 

E = E(ai, o mi ,6i, ...,6 m2 ) 

with probability at least 1 — r^e -0,09 ^ - ™ 11 ), g^c/i ZZiai on £7 we Ziaue i/ie following property: 

Any aj < 0, j = 1, mi, /3j > 0, j = 1, ...,m2, A € R, 5 ^ and L 6 R nxn symmetric satisfying 

mi ni2 

^ ajajaj + ^2 Pjbjbj =L + S + XI, 

3=1 3=1 

must also satisfy 

AT m 2 

2V — mi v - 

5 y/_^Pi) - Am2 + V™v\\ L \\F- 

1 3=1 
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Proof With probability 1 we have a±, a mi , b\, b m2 are linearly independent. Suppose 

{vi, ... j "mi j "mi+1 j •••) %i+m2 > •••> "AT 

} is an orthonormal basis of R N satisfying 
span(oi,...,a mi ) = span(«i, v mi ), 

and 

span(oi,...,o mi , 6i,...,6 m2 ) = span(«i, v mi + ma ). 

Then we can further assume v mi ) only depend on (ai,...,a mi ) and are independent of 

(bi , . . . , 6 m2 ) . Then we have 

/ mi+m2 \ /mi+m,2 mi rrt2 \ /mi+rn2 "12 

( E v jV j,L+s+\i\ = / y, ¥i.Ewj+EWi = E wJ'YlWjtf 

\j=m 1 + l I \j= mi +l j=l j=l / \j=mi+l j=l 

/ JV m,2 \ I mi m,2 \ 

= ( E ^Eft¥iH^-E^i-EwI 

\j=mi+l j=l I \ j=l j=l I 

rri2 / mi \ 

i=i V fe=i / 

Since foj are IID A/"(0, /) random vectors, and are independent from the orthonormal vectors 
v mi , we have 

mi 



\b j \\ 2 -Y.\ v lH 2 -x\N-m 1 ). 



k=l 



By the Chernoff upper bound for the x 2 distribution, we have 



IM 2 



£ \ v T bj \2 > N-mA ^ ( l el/2)(7V -mi)/2 < e -0.09(^V-mi)_ 
fc=l / 



Then we have 

/ mi+m 2 \ m 2 AT 

( £ ^J,L + S + A/)>^(^) 

\j=mi+l / i=l 

with probability 1 — mse" ' 09 ^"™ 1 '. 

On the other hand, we have 

/ m i+m ,2 \ I mi+m-2 \ 

/ ^ Vjvj,L + S + XIJ < ( Vjvj,L + XI J < Xm 2 + \\L\\ F y/m, 

\j'=mi+l / \j=mi+l / 

which implies our claim. 
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The proof of Theorem II. 3t 

We start by defining the event E = E(zi, z m ). First, we define an event 

E = {\(x,z jG )\ 2 < 10 log n, j = 1, ...,m}. 
By the assumption that ||x||2 = 1 and Zj G ~ A/"(0, Ikxk)i we have 

\(x,z jG )\ 2 ~ x 2 (l), 

which implies that F(E ) > 1 - 

Next, for any partition of {1, ...,m} = {ji, j mi } U {fa., fc m2 }, where ji < ... < j mi , fa < ... < 
k m2 , mi > 0, > and m\ + m<i = m, define 

E{iu— ,jm 1 }u{fci,...,fc m2 } = E{ z jiB> b' Zfc i-B' '"' Zfc m 2 

Then by Lemma 13.11 we have 

P(%i,..j mi M*i,...,w) ^ 1 " ^ 2 e- a09( ^ fe - mi) > 1 - me- M9 M- m ). 
Now we define the event E by 

E = E n I p) ^{j 1 ,...j mi }u{fc 1 ,...,fc m2 } 

\ all partitions of [m] 

Then 

¥(E) > 1 - — - 2 m me~°- 09(n ~ fc ~ m) > 1 - — - me -0-09n+0.09fe+0.79m 
Hereafter all our discussions will be on the event E. 

We now come back to derive the necessary condition for xx T to be an optimal point of (|1.4p . 
By section 5.9.2 of [2], the condition is 

0e5(||X||i+ATr(X))| a , a; T + S + ^*(tj), S^O, (S,xx T )=0 

which, using the definition of the subgradient, is equivalent to 

Oesgn(xx T ) + L n ± + XI + S + A*(v), S ^ 0, (S, xx T ) = 0, ||-L n x||oo < 1 

One can verify that S < and (^S,xx T ^ = is equivalent to S 1 H and Vt(S) = 0. Thus the 
necessary condition for xx T to be a minimizer of this program is the existence of a dual certificate 
Y with the following properties: 

m 

Y = c j z j z j = s g n O) sgn(£c) T + L n i_ + XI + 5 T ±, (3.1) 

ll^lloo<l, (3.2) 
5 r x _< 0. (3.3) 
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Project both sides of (|3.ip on T, we have 

m 

Y T = ^2 c 3 z 3b z ob = L r + Air + Sr- 

i=i 

Since T E T -1 , we have 

S r ^ 0. 

It is also obvious that ||-Lr||oo < ll-^n x lloo < 1> which implies 

II-^tIIf < ( n — k)\\Lr\\oo < n — k, and Tr(_Lr) < n — k. 
On the other hand, project both sides of (|3.ip on T, we have 

Yt = ||£c||i (sgn(£c)a; T + xsgn(x) T ) — \\x\\lxx T + L Tnn ± + \xx T , 

and 

^rnn = ||^||i(sgn(a;)a; T + a;sgn(a;) r ) — ||a;||^a;a; r + \xx T , 

which implies 

x T Y T nnx = |(a;, 2 3 - G )| = + A||x||| = ||a;||i + A. 

j=i 

Case 1: A < -§. 

By the assumption k < m < 4Q t " - , we can assume the eigenvalue decomposition 

m 

^ C 3 Z 0B Z T B = Vl u l u l + ■" + VmUmUm + • « m+ l«m+l + ... + • U n _ k U^_ k , 
3 = 1 

where {t*i, ii n _fc} is an orthogonal basis of span(efe+i, e n ). Then by (|3.4p . we have 
uJ(L r + AI r + Sr)wj = uj I J2 c 3 z 3b z 3b J u i = °> 



for j = m + 1, .., n — k. By (|3.5p . we have 

uji r w 3 > ML = _ u J(AJ r + S r )uj > -A > £ 
Since {tii, i*™— fc} is an orthogonal basis of span(efc_|_i, e n ), we have 

n—k I n—k \ 

^ ujL T Uj = ( L r , ^ M j w J ) = Tr(L r ) <n — k. 

j=l \ j=l I 
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By (|3.8p and the assumption 4 < k < m < 4Q t " - , we have 

2_] Uj LvUjUm < n — k — (n — k — m) — < 0. 

i=i 



On the other hand 



y~]ujLrUjU m 

3=1 



\ j=i 



Uj uJ 



< ||Lr||F|| UjuJ\\p < (re — k)y/rn. 



By (|3.9p and (|3.10j) . we have < (re — k)^/rn > (n — k — m)~ — (n — k) which implies 



Case 2: A > -§. 

Let /+ = {£;£ {1, 2 . . . , m}; c k > 0} and I_ = {k £ {1,2, ... ,m};c k < 0}. By {3 
definition of £7 C E$, we have 

||ac||i + A < 101og(n) ^ c,. 

J 67+ 

By ([321), 

^ = J] Cj*j B Z 3B + Yl c i z 3B z 3 T B = l t + AI r + 5 r . 
By the definition of E and Lemma 13, 1\ we have 



n — k — \I. 



J2cj <A|7+| + VI^II-Mf- 



J 6/+ 



Notice that ||L r ||j? < (n - fc)||Xr||oo <n-k. By ffHTTTD and (f3TT2l) 
in — k — m)||a;||f ( n — k — m 



20 log n 



+ A 



20 log n 



m I < \fm(n — k). 



By the assumption that k < m < 4Q ^ n and A > — |, we have 



(re - k - m)(||a5||f - k/2) 



which implies 



m > 



20 log n 

max(||x||? - A;/2,0) 2 



< ym(u — k), 



500 log 2 n 

Therefore, by putting Case 1 and Case 2 together, we have 

. (,k 2 max(||x|| 2 -fc/2,0) 2 

m > mm — — 1 , „ 

V 4 500 log 2 re 
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4 Discussion 



We provide theoretical guarantees on the recovery of a sparse signal from quadratic Gaussian 
measurements via convex programming and show that our results are sharp for a class of recently 
proposed convex relaxations. For this model, unlike classical compressed sensing, compressive phase 
retrieval imposes a stricter limitation on the number of measurements needed for recovery via naive 
convex relaxation than is needed for well-posedness. This leads to a natural open question: can we 
narrow the gap by using other convex programs besides (|1.4p ? 

Theorem 11.31 shows the limitations of (jl.4H in the sense of exact recovery, since we only need to 
recover the support of the unknown vector to recover x by using the PhaseLift algorithm [HE] to 
solve the resulting overdetermined system of quadratic equations. Mathematically, recovering the 
support is at least as easy as exact recovery. Can we do better than (jl.4p by formulating the right 
support recovery problem? We leave these considerations for future research. 
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